fix(proxy): add backpressure handling to prevent hang on large responses#16
Conversation
bentsku
left a comment
There was a problem hiding this comment.
Nice one, this is actually a bug I've introduced while trying to fix the busy waiting, I didn't realize at the time to call selector.modify to remove the always set EVENT_WRITE from before.
I can see that changing events from the selector is definitely the way, found this online too.
I introduced it with #4, but at the time the events were never changed and were set to both read & write, so the selector would busy loop. This dynamic selection is much better, as it's definitely true that the next_conn (naming is hard) might be stuck for writes.
Nice find! LGTM 👍
Let's maybe update the setup.py for the version and the changelog and let's get it released!
| conn.events = selectors.EVENT_READ | ||
| self.selector.modify(sock, selectors.EVENT_READ, data=conn) |
There was a problem hiding this comment.
tiny minor nit: in the last self.selector.modify, we call it with conn.events, I guess because it is a merge of both write and read and it's long. Not sure if we want to keep it for consistency, but not blocking tbh; it doesn't really matter
When forwarding large responses, the proxy's send to the destination socket can block: the receiver's TCP buffer fills up, which fills the proxy's send buffer, which stalls the sender via flow control. With no EVENT_WRITE handling the proxy had no way to retry the stalled send, causing a deadlock for responses larger than the TCP send buffer (~400KB on macOS, ~256KB on Linux). Catch BlockingIOError explicitly (before the generic OSError handler) and register the socket for EVENT_WRITE so the selector retries the flush when buffer space becomes available. Also add return guards after connection close in the EVENT_READ path to prevent fall-through into the now-stale redirect_conn state. Add test_various_payload_sizes covering 1B, 1KB, 100KB, 1MB, 10MB and 10k/100k rows, over both plain and SSL connections, to catch regressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ssl.SSLSocket.send() raises ssl.SSLWantWriteError (not BlockingIOError) when the underlying TCP buffer is full on a non-blocking SSL socket. SSLWantWriteError is a subclass of OSError, so it was caught by the generic connection-close handler, closing the connection mid-response. The client socket stayed open, leaving the caller hanging indefinitely. Catch SSLWantWriteError alongside BlockingIOError in both send paths so SSL connections correctly register EVENT_WRITE and retry when buffer space becomes available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0ea4e3d to
780992b
Compare
Summary
EVENT_WRITEhandling the proxy had no way to retry the stalled send.BlockingIOErrorandssl.SSLWantWriteError(both subclasses ofOSError) were being swallowed by the generic connection-close handler, tearing down the connection instead of retrying.SSLWantWriteErroris raised by non-blocking SSL sockets when the underlying TCP buffer is full — distinct fromBlockingIOErroron plain sockets.Changes
proxy.pyBlockingIOError/ssl.SSLWantWriteErrorexplicitly beforeOSErrorin both send paths; registerEVENT_WRITEon the destination socket so the selector retries the flush when buffer space is available.EVENT_WRITEhandler onconnitself to drain any backloggedout_bytesafter a previous partial-send stall.returnguards after connection close in theEVENT_READpath to prevent fall-through into staleredirect_connstate (double-unregister / use-after-close).tests/test_proxy.pytest_various_payload_sizesparametrized over 1 B, 1 KB, 100 KB, 1 MB, 10 MB and 10 k / 100 k rows, run over both plain and SSL connections. Without the fix the 1 MB+ cases hang.Test plan
pytest tests/test_proxy.py -k test_various_payload_sizes— all 14 cases passpytest tests/test_proxy.py— existing tests unchanged🤖 Generated with Claude Code